When Templeton starts, it loads in any default or specified configuration files. These files are written in a simple option-value format. Each option has a single keyword which specifies the desired option to set. The keyword is followed by the desired value.
The options may be classified into groups:
Although the keywords are not case sensitive (you may use upper or lower case letters), the parameters for some options are case sensitive, including options where a text string or URL is used.Lines beginning with a "#" character are treated as comments.
Register registration_codeexample:
Register 12-34F67-891011Software that is registered contains a unique registration code. This code should be entered exactly as it is provided. If your site contains multiple registrations, you may list each registration code on a line starting with the key word "Register".
Please read the licensing agreement for registration information.
RestrictHost booleanexample:
RestrictHost TRUEThis parameter informs the program not to leave the designated host. Links to machines not on the current host are not traversed.
RestrictPath absolute pathexample:
RestrictPath /peopleThis parameter is only used when a host is restricted. When a host is restricted, a subpath on that host may also be restricted. Hypertext references to documents outside this subtree are not traversed. Either slash "/" or backslash "\" are valid for specifying a directory. The trailing slash or backslash is optional.
RestrictDepth numeric valueexample:
RestrictDepth 3Hyperlinks are travered in a breadth-first search. An unrestricted search may download an entire WWW server's data. By restricting the depth, only immediate portions of the server will be received. Images and non-href links are considered to be at the same depth as the document.
A restricted depth of 0 means no restriction. The default value is 1.
RemoveRestricted booleanexample:
RemoveRestricted FALSEThis parameter informs the program to remove untraversed links. Links to restricted machines or restricted depths are removed from the HTML file, but the visible test is still available (just not a hyperlink). The default value is FALSE.
Exclusion booleanexample:
Exclusion TRUEThis parameter determines whether Templeton will support server provided robot exclusion files (robot.txt). Many servers maintain exclusion files to prevent robots from wandering around virtual directory trees, from retrieving very temporary or uncomplete files, or copyright materials. It is considered "polite" for web agents to obey the exclusion files when they exist. The default value, TRUE, means that robot exclusion files are obeyed. Setting Exclusion to FALSE will ignore robot exclusion files.
It should be noted that robot exclusion files that explicitly restrict Templeton will be honored regardless of the exclusion parameter.
Deny URLexample:
Deny http://foo.com/archive/The URL provided, as well as all subtrees or the URL, are not processed. Many times specific directory subtrees are not desirable. You can deny retrieval of these URL's using this setting.
For example, to NOT retrieve the "archive" subtree of the host loco.com, you would specify:
Deny http://loco.com/archive/If you do not include the trailing slash (http://loco.com/archive) then all subdirectories beginning with "archive" are not processed. This includes "archive.1", "archive.old", "archive_from_1994", etc.
Multiple Deny statements may be specified.
Allow URLexample:
Allow http://foo.com/archive/January/Similar to "Deny", "Allow" explicitly specifies that a subtree is retrievable. When used in conjunction with Deny URL, branches of a subtree may be specified for access, while other subtrees are ignored.
Multiple Allow statements may be specified.
Sleep secondsexample:
Sleep 10Sleep determines the number of seconds to pause before sending a request to a WWW server. SLEEP IS IMPORTANT.
Warning: Templeton can generate thousands of requests per minute. Many WWW servers cannot handle a sudden onslaught of requests. Setting the Sleep parameter to 0 (zero) may generate too many requests for the server and kill the server. This is bad.
A sleep setting of 0 is known to kill the following types of servers:
For safety, you should set the sleep interval to at least 5 seconds. The longer, the better. Remember, this program is automated and can easily run for hours. What's the rush?
Unregistered versions of Templeton cannot have a sleep period less than 5 seconds.
LocalPath absolute pathexample:
LocalPath /LocalPath informs the program where to store the downloaded files. IF this path is:
LocalPath noneTHEN no files are generated. Only a log file containing the remote servers WWW map is created in the current directory.
Currently, files should be stored in the root directory of the file system. For WWW servers, this is the server's root directory. (This limitation will be removed in future releases.) For DOS based machines, this path may include a drive letter:
LocalPath e:\server.www\Either slash "/" or backslash "\" are valid for specifying a directory. The trailing slash or backslash is optional.
FileOverwrite booleanexample:
FileOverwrite TRUEFiles that already exist on the local system are not normally downloaded. Setting the FileOverwrite option to TRUE will overwrite files on the local file system. Default value is FALSE.
ISMAP absolute path to executableexample:
ISMAP /cgi-bin/imagemapFor WWW servers, many imagemaps use a program that takes coordinates from a selected image <IMG SRC=... ISMAP> and return a new URL. Some of the more common methods use a data file containing known coordinates and a program to identify which URL is activated. Commonly, this program is called "imagemap" or "imagemap.exe".
The ISMAP parameter specifies the WWW server's path to the imagemap program.
MapType NCSA or CERNexample:
MapType NCSAFor the executable specified in the ISMAP parameter, this option determines the format of the file. If the image map file can be retrieved, then it is converted into this specified format. Valid options are either "CERN" or "NCSA". The default is NCSA.
FATFormat booleanexample:
FATFormat FALSEDetermines the file name format for the current operating system. DOS based machines using drives formatted with a File Allocation Table (FAT) can only handle file names containing 8 characters and a 3 character extension. Setting this option to TRUE will generate 8.3 character file names. The default is FALSE, and will generate unlimited length file names.
NOTE: Under DOS, this option is always TRUE (DOS only supports FAT file names). Under OS/2, this value becomes TRUE automatically if the destination path (LocalPath) is located on a FAT partition.
Index file nameexample:
Index index.htmlFor hypertext references that only specify a directory, this is the default HTML file in the directory.
NOTE: if FATFormat is TRUE, the 8.3 name translation will be applied to this file name.
The default name is "index.html"
Server-File filenameexample:
Server-File serverfileA data file is generated containing the host name, IP address, and WWW server type for each server visited. For servers listed as IP address only, the host name is also the IP address.
The default is no server logging.
Mailto-File filenameexample:
Mailto-File mailtofileSimilar to Server-File logging, the file name listed on the "Mailto-File" line contains a list of e-mail addresses found in the HTML documents. Only e-mail addresses that are active (hyperlinks) are used. E-mail addresses displayed as plain text in the document or contained in CGI scripts are not listed in the mailto logfile.
NOTE: This list MAY contain duplicate entries. Duplication removal may be added in later versions. A very useful feature for generating mailing lists.
The default is no mailto logging.
User e-mail addressexample:
User webmaster@host.machine.orgIn case of emergency, this is the person who is running the program and who should be contacted to stop the program from running. This MUST be a valid e-mail address, and SHOULD also be available with a network "talk" command.
As a side note, it is never a good idea to let automatic software run unsupervised (especially this type of software). The "User" should be available to read their e-mail at all times during the execution of this program.
The default is the user running the program on the current machine, and the IP address of the current machine. For operating systems with no user accounts, such as DOS or OS/2, the username is taken from the USER environment variable, or "root" if the variable is undefined.
ProxyHost hostname or IP addressexample:
ProxyHost proxyhost.network.netProxy agents are machines that act as a gateway through a firewall. If your local network uses a proxy agent, specify the name of the proxy agent here. If you are uncertain about your network, consult your network manager or provider.
A proxy server is only used when a server is specified.
ProxyPort integerexample:
ProxyPort 80When using a proxy server, the port on the proxy server should be specified. The default port is 80. This values is not used if no proxy host is specified with ProxyHost.
Spoof text-stringexample:
Spoof Mozilla (Templeton)Some WWW servers make incorrect assumptions about the browser/robots. (Most of these are the Netscape servers.) These servers assume that, since the browser is not "Netscape" the browser cannot handle the HTML documents and therefore, the document is not transfered. By "spoofing" a different name, the WWW robot can use a qualified browser name to retrieve the HTML document.
NOTE: The first word of the spoof-name is used for restrictions when robot exclusion is honored (see Exclusion). This means, if Templeton tells the WWW server that it is "Netscape" and the server does not permit Netscape browsers, then the server will also not permit Templeton.
Common spoof names (and browsers) are:
Add URLexample:
Add http://www.cs.tamu.edu/people/This configuration option adds a URL to the list to be processed. Restrictions are applied. The "add" feature make it easy for automated operation of Templeton.
Multiple Add statements may be specified.
NOTE: Templeton has the capability to spawn thousands of applications in a few seconds. On Unix-type systems, Templeton introduces security risks when executed as root.
For applications that are not spawned, Templeton will pause until the application has ended. This allows for a guarenteed order of processing for the called applications.
Command_html stringexample:
Command_html /usr/local/bin/html2txt %sExecute a system command on each HTML document stored on the file system. This may be useful for counting documents, storing statistics, printing, converting, etc. The string should contain the executable to run and a "%s" for the file name. The string "none" turns off this command. This is the default.
Command_image stringexample:
Command_image /usr/local/bin/viewpict %sExecute a system command on each image-file stored on the file system. Similar to Command_html, Command_image is executed on all image files. This may be useful for counting documents, storing statistics, printing, converting, etc. NOTE: no distinction is made between different image formats. The string should contain the executable to run and a "%s" for the file name. The string "none" turns off this command. This is the default.
Command_map stringexample:
Command_map echo %s >> maplogExecute a system command on each image-map stored on the file system. Similar to Command_html, Command_map is executed on all image-map file. This may be useful for counting documents, storing statistics, or converting. The string should contain the executable to run and a "%s" for the file name. The string "none" turns off this command. This is the default.
Command_default stringexample:
Command_default echo %s >> filelistExecute a system command on each file stored on the file system. Similar to Command_html, Command_default is executed on all files that have no other executable specified. This may be useful for counting documents, storing statistics, printing, converting, etc. The string should contain the executable to run and a "%s" for the file name. The string "none" turns off this command. This is the default.
Interactive booleanexample:
Interactive TRUEInteractive determines whether the user should be prompted for configuration information or if Templeton should start running automatically. The default setting is TRUE, causing Templeton to prompt for user interaction.